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Abstract 

Background: Flower color of soybean is primarily controlled by six genes, viz., Wl, W2, W3, W4, Wm and Wp. This 
study was conducted to investigate the genetic and chennical basis of newly-identified flower color variants 
including two soybean mutant lines, 222-A-3 (near white flower) and E30-D-1 (light purple flower), a near-isogenic 
line (Clark-i/\/4), flower color variants (T321 and T369) descended from the i/\/4-mutable line and kw4 (near white 
flower. Glycine sojo). 

Results: Complementation tests revealed that the flower color of 222-A-3 and kw4 was controlled by the recessive 
allele [w4) of the 1/1/4 locus encoding dihydroflavonol 4-reductase 2 (DFR2). In 222-A-3, a single base was deleted in 
the first exon resulting in a truncated polypeptide consisting of 24 amino acids. In Clark-i/\/4, base substitution of the 
first nucleotide of the fourth intron abolished the 5' splice site, resulting in the retention of the intron. The DFR2 
gene of kw4 was not expressed. The above results suggest that complete loss-of-function of DFR2 gene leads to 
near white flowers. Light purple flower of E30-D-1 was controlled by a new allele at the 1/1/4 locus, w4-lp. The gene 
symbol was approved by the Soybean Genetics Committee. In E30-D-1, a single-base substitution changed an 
amino acid at position 39 from arginine to histidine. Pale flowers of T369 had higher expression levels of the DFR2 
gene. These flower petals contained unique dihydroflavonols that have not yet been reported to occur in soybean 
and G. 507a. 

Conclusions: Complete loss-of-function of DFR2 gene leads to near white flowers. A new allele of the 1/1/4 locus, 
w4-lp regulates light purple flowers. Single amino acid substitution was associated with light purple flowers. Flower 
petals of T369 had higher levels of DFR2 gene expression and contained unique dihydroflavonols that are absent in 
soybean and G. sojo. Thus, mutants of the DFR2 gene have unique flavonoid compositions and display a wide 
variety of flower color patterns in soybean, from near white, light purple, dilute purple to pale. 
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Background 

Flower color of soybean {Glycine max (L.) Merr.) is pri- 
marily controlled by six genes {Wl, W2, W3, W4, Wm 
and Wp) [1,2]. Under Wl genotype, soybean genotype 
with W3W4 has dark purple, W3w4 has dilute purple or 
purple throat, w3W4 has purple, and w3w4 has near 
white flowers [3]. Flower color of genotypes with allelic 
combination Wlw3w4 was indistinguishable from those 
with white flowers under many environments, suggesting 
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that environments affect flower color under the allelic 
combination [3]. W3 and W4 encode dihydroflavonol 
4-reductase (DFR) [4,5]. Wh W2, Wm and Wp encode fla- 
vonoid 3 '5 '-hydroxylase, MYB transcription factor, flavo- 
nol synthase and flavanone 3-hydroxylase, respectively 
[6-10]. The roles of these genes in the biosynthesis of 
anthocyanin and flavonol are presented in Figure 1. 

The flavonoids in flower petals of soybean were analyzed 
[11-13]. The primary components of anthocyanin were 
malvidin 3,5-di-O-glucoside, petunidin 3,5-di-O-glucoside, 
delphinidin 3,5-di-O-glucoside and delphinidin 3-0-gluco- 
side. In addition, eight flavonol glycosides, kaempferol 3- 
0-gentiobioside, kaempferol 3-0-rutinoside, kaempferol 
3-0-glucoside, kaempferol 3-0-glycoside, kaempferol 3- 
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Figure 1 Schematic diagram of thie anthiocyanin and flavonol biosyntiietic pathiways. Enzyme names are abbreviated as follow: chalcone 
synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), dihydroflavonol 4-reductase (DFR), flavonoid 3'-hydroxylase (FS'H), flavonoid 
3'5'-hydroxylase (F3'5'H), flavonol synthase (FLS), leucoanthocyanidin dioxygenase (LDOX). Soybean flower color genes encoding the enzymes are in 
italic font. 



0-rhamnosyl-(1^2)-[glucosyl-(1^6)-galactoside], kaemp- 
ferol 7-0-glucoside, kaempferol 7-0-diglucoside and quer- 
cetin 3-0-gentiobioside, and one dihydroflavonol, aroma 
dendrin 3-0-glucoside were identified. No anthocyanins 
were detected in Clark-ivi, a near-isogenic line (NIL) of 
US cultivar Clark at the Wl locus. Anthocyanins were 
not detected in a Clark-iv4 in 2003 and 2004, but trace 
amounts were detected in 2007 [11,12], indicating slight 
responsiveness to environmental conditions in agree- 
ment with the previous report [3]. 

A mutable allele of the W4 locus was discovered in a 
cross between two experimental lines with white and pur- 
ple flowers, respectively [14]. The mutant line was desig- 
nated as T322, and the mutable allele was designated as 



w4'm. Mutant lines T321 with w4-dp allele (dilute purple 
flower) and T369 with w4-p allele (pale flower) were iso- 
lated from descendants of T322 [15,16] (Figure 2). A 
20.5-kb transposable element {Tgm9) was isolated from 
the second intron of the DFR2 gene [5]. In T321 and 
T369, Tgm9 was excised from the second intron, leaving 
behind 4- and 0-bp footprints, respectively [5]. A 5' end 
fragment of Tgm9 (944 bp) was integrated at a position 
1043 bp upstream of the transcription start site in T321. 
A fragment of Tgm9 was inserted at a position 1034 bp 
upstream of the transcription start site in T369. Soy- 
bean has two other DFR genes, DFRl and DFR3 [17]. 
DNA marker analysis suggested that W3 locus might 
correspond to the DFRl [17]. 
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Figure 2 Banner petals of flower color variants of soybean and Glycine soja. 
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Flower color of the wild relative of soybean, Glycine soja 
Sieb. & Zucc. is almost exclusively purple; by contrast, 
33% (5,544 out of 16,855) of the soybean accessions in 
the USDA Soybean Germplasm Collections have white 
flowers (Dr. R.L. Nelson, personal communication 2006). 
One white-flowered plant was found in 1998 among the 
progeny of a purple-flowered G. soja accession that was 
introduced from South Korea [18]. Genetic analysis indi- 
cated that the white flower was caused by a recessive allele 
at the Wl locus similar to the white-flowered soybeans 
[18]. The mutation may have occurred during propagation 
at USDA. In 2002, a variant with light purple flowers, 
B09121 was discovered in southern Japan [13]. Genetic 
analysis suggested the light purple color was controfled 
by a new aflele at the Wl locus, wl-lp. Flower petals of 
B09121 contained lower amounts of the four major an- 
thocyanins common in purple flowers and contained 
smafl amounts of the 5 '-unsubstituted versions of the 
abovementioned anthocyanins, peonidin 3,5-di-O-glu- 
coside, cyanidin 3,5-di-O-glucoside and cyanidin 3-0- 
glucoside [13]. B09121 may be the first example of a 
flower color variant of G. soja found in nature. 

Lines varying in flower color were obtained from 
mutagenized populations of US cultivar Bay. Line 222- 
A-3 with near white flowers was isolated from an X-ray 
treated population, whereas line E30-D-1 with light pur- 
ple flowers was developed from an EMS -treated popula- 
tion [19] (Figure 2). Dr. Donghe Xu (JIRCAS, Japan) 
found kw4, a G. soja accession with near white flowers, 
among accessions introduced from South Korea (per- 
sonal communication, 2007) (Figure 2). It is unknown if 
the accession has near white flowers in the natural habi- 
tat. This study was conducted to investigate the genetic 
and chemical basis of flower color variants in G. max 



(Clark-M/4, T321, T369, 222-A-3 and E30-D-1) and in 
G. soja (kw4). 

Methods 

Genetic analysis 

The plant materials used in this study are listed in 
Table 1. US cultivars, Clark and Bay, have purple flowers 
{WlW2w3W4WmWp), Clark has tawny (T) and Bay has 
gray pubescence {t), 222- A-3 and E30-D-1 were crossed 
with Clark-Tv4 (L68-1774, near white flower and tawny 
pubescence, WlW2w3w4WmWpT), E30-D-1 was also 
crossed with Clark. Flowers of 222- A-3 and E30-D-1 
were emasculated one day before opening and fertflized 
with pollen from Clark or Clark-Tv4. A NIL of a Canadian 
cultivar Harosoy with w4 allele, Harosoy-iv4 (L72-1138, 
near white flower and gray pubescence, WlW2w3w 
4WmWpt) was crossed with kw4 (tawny pubescence). 
Hybridity of the Fi plants was ascertained by tawny pu- 
bescence color. Seeds of NILs, T321 and T369 were 
provided by the USDA Soybean Germplasm Collection. 
The NILs were developed by backcrossing the near 
white flower trait six times from the cultivar Laredo 
into Clark or Harosoy (Table 1) [20]. 

A total of seven Fi and 130 F2 seeds derived from 
Harosoy- Tv4 x kw4 were field-planted on June 12 in 2009 
at the National Institute of Crop Science, Tsukuba, Japan 
(36°06'N, 140°05'E). Similar numbers of Fi and F2 seeds 
derived from two crosses (222-A-3 x Clark-iv4 and E30-D- 
1 X Clark-Tv4) were planted on June 10 in 2010. A bulk of 
30 seeds each of fifty F3 families derived from E30-D-1 x 
Clark-iv4 were planted on June 7 in 2012 and June 10 in 
2013. A total of six Fi and 130 F2 seeds derived from E30- 
D-1 X Clark were planted on June 10 in 2013. N, P and K 
were applied at 3.0, 4.4 and 8.3 g m"^, respectively. Plants 
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Table 1 Plant materials of soybean and Glycine soja used in this study 



Line 


Flower color 


Genotype 


Origin 


Cross combination (Year of crossing) 


Clark 


Purple 


WlW2w3W4WmWpT 


- 


- 


Bay 


Purple 


WlW2w3W4WmWpt 


- 


- 


L68-1774 (Clark-iv4) 


Near white 


WlW2w3w4WmWpT 


L6^(6) X (Laredo x Harosoy) 


- 


L72-1138 (Harosoy-w4) 


Near white 


WlW2w3w4WmWpt 


L2^(6) X Laredo 




222-A-3 


Near white 




X-ray induced mutant of Bay 


222-A-3xClark-w4 (2007) 


E30-D-1 


Light purple 




EMS-induced mutant of Bay 


E30-D-1 X Clark (2012) 










E30-D-1 xClark-iv4 (2008) 


kw4 


Near white 




G. soja accession of South Korea 


Harosoy-w/4 x kw4 (2008) 


T321 


Dilute purple 


WlW2w3w4-dpWmWpt 


Germinal revertant derived from T322 




T369 


Pale 


WlW2w3w4-pWmWpt 


Germinal revertant derived from T322 





Phytophtora and pustle-resistant Clark isoline with genes Rpsi and rxp. 
Phytophtora and pustle-resistant Harosoy Isoline with genes Rpsi and rxp. 



were individually grown with spacing of 70 cm between 
rows and 10 cm between plants. Flower color was re- 
corded in individual Fi, F2 and F3 plants. 

Analysis of flavonoids 

Banner petals were collected at the day of opening from 
field-grown plants in 2008. Three 200 mg samples of 
banner petals were collected in 2 ml of MeOH containing 
0.1% (v/v) HCl for anthocyanin analysis. Three 200 mg 
samples in 2 ml of absolute MeOH were also collected for 
the determination of flavonol and dihydroflavonol. High 
performance liquid chromatography (HPLC) of antho- 
cyanins, flavonols and dihydroflavonol was performed 
following previously described protocols [11]. The 2 ml 
extracts were filtered through disposable filtration units 
(Maishoridisc H-13-5, Tosoh) and 10 [A from each sample 
was subjected to HPLC analysis. The amount of flavonoids 
was estimated from the pertinent peak area in the HPLC 
chromatogram (detection wavelength of anthocyanins = 
530 nm; flavonols = 351 nm; dihydroflavonols = 290 nm). 
The peak area was subjected to analysis of variance using 
the Statistica software (StatSoft). 

Molecular cloning 

Total RNA was extracted from banner petals (200 mg) 
using the TRIZOL Reagent (Invitrogen) according to the 
manufacturers instructions. cDNA was synthesized by 
reverse transcription of 5 [ig of total RNA using the Super- 
script III First-Strand Synthesis System (Invitrogen) and 
an oligo(dT) primer according to the manufacturers in- 
structions. The full-length cDNA was cloned by end-to- 
end PCR from the plant materials using a pair of PCR 
primers shown in Table 2. The PCR mixture contained 
0.5 (ig of cDNA, 10 pmol of each primer, 10 pmol of nu- 
cleotides and 1 unit of ExTaq in 1 x ExTaq Buffer supplied 
by the manufacturer in a total volume of 50 [il A 5 min 
denaturation at 94°C was followed by 30 cycles of 30 sec 



denaturation at 94°C, 1 min annealing at 59°C and 1 min 
extension at 72°C. A final 7 min extension at 72°C 
completed the program. The PCR was performed in an 
Applied Biosystems 9700 thermal cycler. The PCR prod- 
ucts were cloned into pCR 2.1 vector (Invitrogen) and 
sequenced. To evaluate the approximate size of PCR 
amplicons, PCR products were separated on a 2% agar- 
ose gel and visualized by EtBr staining. 

Genomic DNA was isolated from trifoliolate leaves by 
CTAB [21]. Genome sequences containing the entire 
coding region (about 3.3 kb) and the 5 ' upstream region 
(about 1.2 kb) of Clark and kw4 were determined by 
cloning two fragments overlapping each other using the 
PCR primers listed in Table 2. The 5' upstream region 
was also cloned from Bay and E30-D-1. The PCR mix- 
ture contained 10 ng of genomic DNA, 10 pmol of each 
primer, 10 pmol of nucleotides and 1 unit of ExTaq in 
1 X ExTaq Buffer in a total volume of 50 [A, The PCR 
products were cloned into the pCR 2.1 vector. 

Sequencing analysis 

Nucleotide sequences of both strands were determined 
with the BigDye terminator cycle method using an 
ABI3100 Genetic Analyzer (Applied Biosystems). Primers 
are exhibited in Table 2. Nucleotide sequences and the 
putative amino acid translations were analyzed with 
GENETYX ver. 8.1.2 (GENETYX). Sequences were 
aligned using ClustalW (http://clustalw.ddbj.nig.ac.jp/ 
index.php?lang=ja) at default settings. 

CAPS analysis 

Genomic DNA of Clark-w4, E30-D-1 and 40 F2 plants 
that were used for F3 progeny tests were isolated from 
trifoliolate leaves by CTAB. A pair of PCR primers 
(Table 2) was designed to detect a single-base substitu- 
tion found in E30-D-1. The base substitution within the 
restriction site would result in the presence/absence of 
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Table 2 PCR primers used in this study 



Purpose 


Target 


Forward primer (5-3') 


Reverse primer (5-3') 


cDNA cloning 


DFR2 


AACCAAAACAACGAGAGAGA 


CTOTCCCTGATATGAAAGC 


cDNA sequencing 




TGCTAGACATCATGAAAGCA 


TGTGAACAGCATATGTACCT 




DFR2 


CACTGCTCmCACTAATCA 


GATOGTGAAAGAGCAGTGA 






TACCCTGAGTATAATGTCCT 


TOACGCATGCmCATGAT 


Cloning of genomic fragment 


Upstream fragment of DFR2 


ACGGmCTOCATOCA^ 


ACTOAmCAGCCATGGTA 




Downstream fragment of DFR2 


GTOATCAATGCACATAGAC 


CTOTCCCTGATATGAAAGC 


Sequencing of genomic fragment 


Upstream fragment of DFR2 


TACAAGTOTCATCACGATC 


GAAGCmCATGAAGCCA^ 






mGGTGTACACTCGTATGT 


CACAATOTATCATOGGCA 






ATGTAACATGATGGTOGTG 


AACCACCATOCTOATACC 




Downstream fragment of DFR2 


CI! II ICTCTGCAGGmCA 


TAGTGGATGAATATGATOT 






AAGTACCATOCAACATOA 


GATAGATGACAGTOTOTC 






TGTOTGCTCmGGCATAT 


ACCCTGAGTATAATGTCC^ 


CAPS analysis 


DFR2 


ACGGmCTOCATOCA^ 


CAAATGCTOACCTOTOA 


Cloning of 5' upstream region 


Upstream fragment of DFR2 


AGAGATATATAAGAAGTOGGA 


TATCACGAAATAG 1 1 1 1 IGIAAT 




Downstream fragment of DFR2 


CCmACCATCTACAAGATAA 


ATGATGTAATATOGGAACCT 


Sequencing of 5' upstream region 


DFR2 


GAAAAGAGAAATAGGTATOTA 


GmAACTAATCAAACTAAA^ 


Real-time PCR 


DFR2 


CCAAGGACCCTGAGAATGAA 


CAGAAGTCAACATCGCTCCA 




Actin 


GTCCmCAGGAGGTACAACC 


CCACATCTGCTGGAAGGTGC 



the restriction site of BsrGl in the amplified product. 
The PCR mixture contained 30 ng of genomic DNA, 5 
pmol of each primer, 10 pmol of nucleotides and 1 unit 
of ExTaq in 1 x ExTaq Buffer supplied by the manufac- 
turer (Takara Bio) in a total volume of 25 [A, After an 
initial 30 sec denaturation at 94°C, there were 30 cycles 
of 30 sec denaturation at 94°C, 1 min anneaUng at 56°C 
and 1 min extension at 72°C. A final 7 min extension at 
72°C completed the program. The amplified products 
were digested with BsrGl, and the digests were sepa- 
rated on an 8% nondenaturing polyacrylamide gel in 
1 X TBE buffer (90 mM Tris-borate, 2 mM EDTA, 
pH 8.0). After electrophoresis, the gel was stained with 
ethidium bromide and the DNA fragments were visual- 
ized under UV light. 

Quantitative real-time PCR 

For quantitative real-time PCR, total RNA (5 [xg) from 
each of three replicate banner petal samples was reverse- 
transcribed using the Superscript III First-Strand Synthesis 
System and an oligo d(T) primer. Primer sequences are 
exhibited in Table 2. The PCR mixture contained 0.4 [A of 
cDNA synthesis reaction mixture, 6 pmol of each primer, 
1 X ROX reference dye, 1 x SYBR Premix Dimer Eraser 
(Takara Bio) and water to a final volume of 20 [il Analysis 
was done using the StepOnePlus Real-Time PCR System 
(Applied Biosystems). The initial 30 sec denaturation at 
95°C was followed by 40 cycles of 3 sec denaturation at 
95°C, 30 sec annealing at 58°C and 30 sec extension at 
72°C. The expression level of the soybean actin gene 



(GenBank accession number: J01298) [22] was used to 
normalize target gene expression. 

Accession numbers 

Sequence data of the DFR2 gene were deposited in the 
DDBJ Data Libraries under accession nos. AB872212 
(cDNA of Bay), AB872213 (cDNA of Clark-iv4), AB872214 
(cDNA of 222-A-3), AB872215 (cDNA of E30-D-1), 
AB872216 (genomic DNA of Clark) and AB872217 
(genomic DNA of kw4). 

Results 

Genetic analysis 

Fi plants derived from a cross between Harosoy-iv4 and 
kw4 had near white flowers (Table 3). All of the 116 
plants of the F2 population had near white flowers, sug- 
gesting that flower color of kw4 was controlled by the 
w4 allele. Fi plants derived from a cross between 222-A- 
3 and Clark- iv4 had near white flowers. All of the 109 
plants of the F2 population had near white flowers, sug- 
gesting that flower color of 222-A-3 was also controlled 
by the w4 allele. 

Fi plants derived from a cross between E30-D-1 and 
Clark had purple flowers. A total of 112 plants of the F2 
population segregated into 84 plants with purple flowers 
and 28 plants with light purple flowers. The segregation 
fitted a 3:1 ratio (x^ = 0.00, P = 1.00) suggesting that a 
single gene controls flower color and that the allele for 
purple flower was dominant to that for light purple 
flower. Fi plants derived from a cross between E30-D-1 
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Table 3 Segregation of flower color In Fi plants and F2 populations derived from crosses between soybean cultivar 
Clark or near-isogenic lines (Harosoy-i/i^4 and C\ark-w4) and flower color variants. Glycine soja accession, kw4, and 
soybean mutant lines, 222-A-3 and E30-D-1 in Tsukuba, Japan 



Generation 


Year 


Total 


Number of plants 
Purple Light purple 


Near white 


Expected 
ratio 


value Probability 
(r vaiuej 


kw4 


2009 


10 


- 


10 


- 


- 


Harosoy-w/4 (H-w/4) 


2009 


10 


- 


10 


- 


- 


H-w4 X kw4 Fi 


2009 


5 


- 


5 


- 


- 


H-iA/4 X kw4 F2 


2009 


116 


- 


116 


- 


- 


222-A-3 (222) 


2010 


10 


- 


10 


- 


- 


Clark-w/4 (C-w/4) 


2010 


10 




10 






222 X C-w4 Fi 


2010 


5 




5 






222 X C-w4 F2 


2010 


109 




109 






E30-D-1 (E) 


2010 


10 


10 








Clark (C) 


2010 


10 


10 








ExCFi 


2010 


4 


4 








EXCF2 


2013 


112 


84 28 




3:1 


0.00 1.00 


E X C-w4 Fi 


2010 


3 


3 








E X C-w4 F2 


2010 


111 


82 


29 


3:1 


0.08 0.78 



and Clark- iv4 had light purple flowers. A total of 111 
plants of the F2 population segregated into 82 plants 
with light purple flowers and 29 plants with near white 
flowers. The segregation fitted a 3:1 ratio (x^ = 0.08, P = 
0.78) suggesting that the W4 locus controls the flower 
color and the allele for light purple flower was dominant 
to that for near white flower. All of the ten F3 families 
derived from F2 plants with near white flowers had near 
white flowers. A total of 40 families derived from F2 
plants with light purple flowers segregated into 16 fam- 
ilies fixed for light purple flowers and 24 families segre- 
gating for flower color (Table 4). The segregation fitted a 
1:2 ratio (x^ = 0.80, P = 0.37) confirming that an allele at 
the W4 locus controls flower color. The new allele was 
designated as w4-lp. The gene symbol was approved by 
the Soybean Genetics Committee. The dominance rela- 
tionship of the alleles is W4 > w4-lp > w4. 

HPLC analysis 

Four anthocyanin components, Al: malvidin 3,5-di-O- 
glucoside, A2: petunidin 3,5-di-O-glucoside, A3: delphi- 
nidin 3,5-di-O-glucoside, A4: delphinidin 3-0-glucoside 



were detected in agreement with previous studies 
[11,12] (Table 5). Flowers of T369 contained 59.8% of 
total anthocyanins compared with Clark. Less anthocy- 
anins were detected in T321 (44.7%) and E30-D-1 
(39.3%). Near white flowers of 222-A-3 had the lowest 
level of anthocyanins (15.6%). Near white flowers of 
kw4 had only trace amount of the two components, Al 
and A2. All cultivars and lines except for 222-A-3 and 
kw4 had all four components with the amounts de- 
creasing in the following order: Al > A2 > A3 > A4. 

All cultivars and lines had eight flavonol glycoside com- 
ponents, FI (kaempferol 3-0-gentiobioside), F2 (kaemp- 
ferol 3-0-rutinoside), F3 (kaempferol 3-0-glucoside), F4 
(kaempferol 3-0-glycoside), F5 (kaempferol 3-0-rhamno- 
syl-(l 2)-[glucosyl-(l 6)-galactoside]), F6 (quercetin 
3-0-gentiobioside), F7 (kaempferol 7-0-glucoside), F8 
(kaempferol 7-0-diglucoside) in accordance with pre- 
vious studies [11-13] (Table 6). The total amounts of 
flavonol glycosides were not very different among culti- 
vars and lines except for T369. FI was most abundant 
and accounted for about 80% of flavonol glycosides in 
these cultivars and lines in accordance with previous 



Table 4 Segregation of flower color in F3 families derived from a cross between E30-D-1 and a soybean near-isogenic 
line Clark-iv4 in 2012 and 2013 in Tsukuba, Japan 

l^.^g Number of families Expected value Probability 

Total Fixed for light purple Segregating Fixed for near white "^^^'o value) 

E30-D-1 X Clark-w4 F3 (light purplef 40 16 24 - 1:2 0.80 0.37 

E30-D-1 X Clark-w4 F3 (near white)^ 10 - - 10 _ _ _ 



"F3 families derived from F2 plants with light purple flowers. 
^Fs families derived from F2 plants with near white flowers. 
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Table 5 Anthocyanin content [mean ±SD (x 10^)] 
according to HPLC analysis of flower petals from soybean 
and Glycine soja in 2008 at Tsukuba, Japan 



Line name 


A1^ 


A2 




A3 


A4 


Total 


Clark 


933 ±40 


538 ± 


21 


399±13 


255 ±20 


2,1 25 ±77 


Bay 


1,745 ±326 


610± 


99 


323 ± 62 


267 ± 14 


2,945 ±331 


222-A-3 


142 ±28 


189± 


30 


0±0 


0±0 


331 ±16 


E30-D-1 


345 ±4 


241 ± 


20 


151 ±3 


98 ±65 


835 ± 65 


kw4 




t 




0±0 


0±0 




T321 


369 ±107 


255 ± 


22 


178 ±36 


148±4 


950 ±155 


T369 


513±32 


348 ± 


17 


232 ± 44 


178 ±8 


1,271 ±89 


LSDo.05 


239 


76 




58 


48 


268 



^A1: malvidin 3,5-di-O-glucoside, A2: petunidin 3,5-di-O-glucoside, 
A3: delphinidin 3,5-di-O-glucoside, A4: delphinidin 3-0-glucoside. 
"^Trace amount. 



studies [11,12]. The amount of F2 was extremely low in 
kw4 and comprised only 0.1% of the total amount of 
flavonol glycosides. Flowers of T369 had substantially 
lower amount of flavonol glycosides (16.0% of Clark). 

Only one kind of dihydroflavonol (Dl, aromadendrin 
3-0-glucoside) was detected in all cultivars and lines ex- 
cept for T369 (Table 7). The amount varied from 57.4 
(E30-D-1) to 163.8% (kw4) compared with Clark. In con- 
trast, flower petals of T369 contained only 11.4% of Dl 
compared with Clark, in addition to two unique peaks 
corresponding to dihydroflavonols, D2 and D3 (Figure 3). 

Molecular cloning 

DNA fragments of about 1.1 kb were amplified by RT- 
PCR in Clark, Bay, 222-A-3 and E30-D-1 (Figure 4). Frag- 
ments of about 1.4 kb were amplified from Clark- No 
amplification product was observed in kw4. The coding 
region of DFR2 gene of Clark and Bay were 1065 bp long 
and they encoded 354 amino acids. Amino acids were 
identical except for two substitutions around the C- 
terminus at positions 338 (valine or glutamic acid) and 
353 (arginine or glutamine). Comparison of nucleotide 



sequences between cDNA and genomic DNA of Clark 
revealed that the DFR2 gene has six exons and five in- 
trons similar to a previous report (Figure 5) [5]. 

Bay had a T at nucleotide position 29 which was ab- 
sent in 222-A-3. This deletion probably generated a 
truncated polypeptide consisting of only 24 amino acids 
(Figure 6A and 6B). The polypeptide lacked the NADPH 
binding domain [23]. In E30-D-1, a single base was 
substituted from G to A at nucleotide position 116 
compared with Bay (Figure 6C). The base-substitution 
altered amino acid at position 39 from arginine to histi- 
dine. The 5' upstream region of E30-D-1 was identical 
with that of Bay and Clark. In Clark- m/4, cDNA had a 
344-bp insertion compared with Clark and Bay. The in- 
sertion corresponded to the fourth intron with five nu- 
cleotide substitutions compared with Clark, suggesting 
that the fourth intron was retained in Clark-iv4. In 
Clark-iv4, a single-base G at the start of the fourth intron 
was changed to A compared with the genome sequence of 
Clark (Figure 6D). The base substitution may have abol- 
ished the 5' splice site (GT) resulting in the retention of 
the intron (Figure 5). The retention caused a mutation 
from amino acid position 217 and premature translation 
termination at amino acid position 227 (Figure 6D). 

In kw4, transcripts of the DFR2 gene in the flower 
petals were not detected by RT-PCR. The genomic frag- 
ment containing the entire coding region was amplified 
by PCR. Six exons and five introns were assumed similar 
to Clark (Figure 5). The amino acid sequence was identical 
with that of Clark. A 367-bp fragment was deleted in the 
third intron of kw4 (Figure 5). The 5 ' upstream region of 
kw4 had six single-base substitutions, three single-base 
indels, two two-base indels and a three-base alteration in- 
cluding one indel (Additional file 1: Figure SI). 

CAPS analysis 

PCR with CAPS primers generated amplified products of 
377 bp in Bay, Clark, Clark-iv4, 222-A-3 and E30-D-1 
(Figure 7). Digestion with BsrGl generated a band of 



Table 6 Flavonol glycoside content [mean ± SD (x 10^)] according to HPLC analysis of flower petals from soybean and 
Glycine soja in 2008 at Tsukuba, Japan 



Line name 


F1" 




F2 


F3 


F4 


F5 


F6 




F7 




F8 


Total 


Clark 


9,432 ± 


103 


772 ± 34 


177 ±6 


441 ±26 


353 ±38 


138± 


10 


13± 


0 


128±13 


1 1,454 ± 


177 


Bay 


8,508 ± 


278 


788 ± 52 


162±5 


246±10 


429 ± 26 


131 ± 


8 


53 ± 


0 


179±15 


1 0,496 ± 


385 


222-A-3 


7,836 ± 


426 


698 ± 39 


134±9 


275 ±33 


459 ± 42 


124 ± 


2 


323 ± 


55 


365 ±15 


10,21 4 ± 


477 


E30-D-1 


8,001 ± 


491 


732 ±37 


168±78 


335 ± 24 


465 ± 20 


117± 


32 


274 ± 


43 


318±56 


1 0,409 ± 


630 


kw4 


1 0,947 i 


:386 


16±2 


432 ±5 


695 ± 26 


802 ± 50 


154 ± 


7 


354 ± 


52 


41 7 ±38 


13,81 6 ± 


533 


T321 


9,41 7 ± 


476 


805 ± 50 


159±17 


371 ±20 


523±16 


100± 


2 


174± 


10 


287 ±12 


1 1,838 ± 


598 


T369 


703 ± 


:9 


214±26 


102±2 


135± 1 


243 ±5 


151 ± 


2 


130i 


:6 


158±16 


1,837 ± 


55 


LSDo.05 


647 




69 


56 


41 


58 


24 




60 




52 


827 





^F1 (kaempferol 3-0-gentiobioside), F2 (kaempferol 3-0-rutinoside), F3 (kaempferol 3-0-glucoside), F4 (kaempferol 3-0-glycoside), F5 (kaempferol 3-0-rhamnosyl- 
(1^2)-[glucosyl-(1^6)-galactoside]), F6 (quercetin 3-0-gentiobioside), F7 (kaempferol 7-0-glucoside), F8 (kaempferol 7-0-diglucoside). 
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Table 7 Dihydroflavonol content [mean ±SD (x 10^)] 
according to HPLC analysis of flower petals from soybean 
and Glycine soja in 2008 at Tsukuba, Japan 



Line name 


D1 


a 


D2 


b 


D3 


b 


Total 


Clark 


843 ± 


53 


0± 


0 


0± 


0 


843 ± 


53 


Bay 


758 ± 


83 


0± 


0 


0± 


0 


758 ± 


83 


222-A-3 


593 ± 


40 


0± 


0 


0± 


0 


593 ± 


40 


E30-D-1 


484 ± 


24 


0± 


0 


0± 


0 


484 ± 


24 


kw4 


1,381 ±58 


0± 


0 


0± 


0 


1,381 : 


h58 


T321 


646 ± 


22 


0± 


0 


0± 


0 


646 ± 


22 


T369 


96 ± 


8 


153d 


:39 


54 ± 


10 


303 ± 


54 


LSDo.05 


86 




27 


7 




94 





^D1: aromadendrin 3-0-glucoside. 
^D2, D3: unidentified diliydroflavonols. 



194 bp in E30-D-1, whereas the bands were not digested 
in the other materials (Figure 7). CAPS analysis of an F2 
population derived from a cross between E30-D-1 and 
Clark-iv4 together with the F3 progeny tests revealed that 
plants fixed with light purple flowers had only a shorter 
band, plants fixed with near white flowers had only a 
longer band and plants segregating for flower color had 
both bands. Thus, the CAPS marker co-segregated with 
flower color. 



Quantitative real-time PCR 

Results of real-time PCR are presented in Figure 8. Tran- 
script level of T321 was low and 16.8% of Bay. Transcript 
levels of 222- A-3, E30-D-1 and Clark- iv4 were much lower 
at 7.7, 3.8 and 3.7%, respectively. Transcripts of the DFR2 
gene of kw4 were not detected by real-time PCR. In con- 
trast to the above flower color variants, the transcript level 
of T369 was about 2.3 times of Bay (Figure 8). 

Discussion 

Previous studies revealed that the W4 gene was mutated 
in flower color variants, Clark- w4, T321 and T369 [3,5]. In 
this study, complementation tests revealed that the flower 
color of 222-A-3 and E30-D-1, and that of a G. soja acces- 
sion, kw4, was also controlled by this gene. Amino acid 
polymorphism or null expression of the DFR2 gene was 
associated with flower color variation. 

In 222-A-3, a single-base deletion caused a frame-shift 
mutation from amino acid position 11; this is expected 
to produce a truncated polypeptide of only 24 amino 
acids that lacked the NADPH binding domain. Thus, the 
DFR2 transcript of 222-A-3 may be nonfunctional. In 
Clark- iv4, the first nucleotide of the fourth intron was 
substituted from G to A. The base substitution in Clark- 
w4 may have abolished the 5' splice site (spliceosome 
recognition site). Retention of the fourth intron caused a 
frame-shift mutation and changed subsequent amino 



U 

o 



600- 



300- 



40- 



20- 



Dl 



Clark 



T369 




5 10 
RETENTION TIME (min) 



15 



Figure 3 HPLC chromatogram of dihydroflavonols extracted from flower petals of a soybean cultivar Clark and T369. A total of 200 mg 
of banner petals was extracted with 2 ml of MeOH. Eluents: MeCN/H20/H3B03 (22:78:0.2). Flow-rate: 1.0 ml/min. Injection: 10 Detection: 
290 nm. Dl, aromadendrin 3-0-glucoside; D2 and D3, unidentified dihydroflavonols. 
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222 E30 kw4 




1500- 



1000 
800- 



Figure 4 Agarose gel electrophoresis of RT-PCR products 
corresponding to the entire coding region of DFR2 gene in 
soybean and Glycine soja. M, 100 bp ladder marker; Clk, Clark; 
C-w4; Clark-w4; 222, 222-A-3; E30, E30-D-1 . The migration of size 
markers (bp) is shown to the left of the gel. 



acids. Translation was prematurely terminated at amino 
acid position 227. DFR genes have many amino acids 
conserved across plant species in the downstream of the 
mutation [24]. The results strongly suggest that the 
DFR2 gene of Clark- iv4 is not functioning. In kw4, the 
DFR2 gene was not expressed in flower petals. A 367-bp 
fragment was deleted from the third intron of this gene 
in kw4. However, it is not clear whether the deletion in 
the intron might be responsible for null gene expression. 
Therefore, we investigated the 5' upstream region to 
check if any mutation occurred in the promoter region. 
There were many nucleotide polymorphisms in the 5' 
upstream region; six single-base substitutions, three 
single-base indels, two two-base indels and a three-base 
alteration including an indel. The accumulation of a sub- 
stantial number of mutations in the promoter region, 
which is probably the reason for non-transcription of this 
locus in kw4, is characteristic of genes that are being deac- 
tivated into pseudogenes. Promoter assays may be ne- 
cessary to determine which polymorphism is critical for 
gene expression. Features of DNA sequences in Clark- 
222- A- 3 and kw4 strongly suggest that complete loss- 
of-function of DFR2 gene may lead to substantial reduc- 
tion of anthocyanins and near white flowers. 

Soybean has three variants of DFR genes, DFRl, DFR2 and 
DFR3 [17]. The function of DFR2 may be partially supple- 
mented by the activity of other DFR genes depending 



on environmental conditions. The transcript level of 
Clark-iv4 and 222-A-3 was substantially lower than that of 
Bay, probably because of nonsense-mediated mRNA decay, 
a surveillance mechanism to eliminate aberrant mRNA 
transcripts that contain premature stop codons [25]. 

The 5' upstream regions of the DFR2 gene in E30-D-1, 
Bay and Clark were identical. In the first exon of E30-D-1, 
however, a single-base substitution altered an amino acid 
at position 39 from arginine to histidine. The position of 
the residue was slightly downstream of the NADPH bind- 
ing region. No catalytic domain has been assigned to the 
region, but the arginine residue is conserved across eight 
plant species [24]. Further, CAPS marker to detect the 
base substitution co-segregated with flower color. These 
results suggest that the amino acid substitution might 
have affected transcript abundance and/or DFR function 
resulting in reduced anthocyanin contents and paler 
flower color. Transgenic experiments may be necessary to 
ascertain the functional importance of this residue. 

Flavonol glycoside content in flower petals of T321 was 
similar to that of Clark. In contrast, that of T369 was 16.0% 
of Clark. The DFR2 gene was over-expressed in flower 
petals of T369 but it was barely expressed in T321. The re- 
duction of flavonol glycosides in T369 may be explained by 
substrate competition between over-expressed DFR and fla- 
vonol synthase (Figure 1). Flower petals of T369 contained 
substantially lower amounts of Dl but it had unique dihy- 
droflavonol components, D2 and D3, that are absent in the 
soybean and G. soja accessions analyzed so far. Over- 
expression of DFR2 gene may be responsible for the unique 
dihydroflavonol composition. Chemical structure of D2 and 
D3 should be determined to investigate the novel features 
of DFR2 flinction displayed by T369. 

In T321 and T369, Tgm9 was excised from the second 
intron of the DFR2 gene, leaving behind 4- and 0-bp 
footprints, respectively [5]. A 5' end fragment of Tgm9 
(944 bp) was integrated at the 1043 bp upstream of the 
transcription start site in T321. A fragment of Tgm9 was 
inserted at the 1034 bp upstream of the transcription 
start site in T369 [5]. In both cases, excision of Tgm9 



Clark 



Clark-w4 



kw4 



I exon - 



- intron ▼ start codon v stop codon 



100 bp 

Figure 5 Intron/exon structure of DFR2 gene from Clark (G. max), a Clark near-isogenic line with w4 allele, Clark-iv4 and kw4, a Glycine 
soja accession. 
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A 




Bay 


MGSSSASESVCVTGASGFIGSWL\a>lRLIERGYWRATWDPANMKKVKHL\^LPGAKTKL 60 


222-A3 


MGSSSASESVALQEPLVSSGHGLS* 2 4 

SLWKADLAQEGSFDEAIKGCTGVFHVATPMDFDSKDPENEVIKPTINGLLDIMKACVKAK 120 
WRRLVFTSSAGTVDVTEHPNPVIDENCWSDVDFCTR\™TGI'?MYFVSKTLAEQEAWKYA 180 
KEHNIDFISVIPPLWGPFLMPTMPPSLITALSLITGNESHYHIIKQGQF\^HLDDLCLGH 24 0 
IF\^FENPKAEGRYICCSHE7\TIHDIAKLLNQKYPEYNVLTKFKNIPDELDIIKFSSKKIT 300 
DLGFKFKYSLEDMFTGA\^TCREKGLLPKPEETTVNNVLLPKPAETT\T^DTMRK* 354 


B 




Bay 


GAAAGTGTTTGCGTTACAGGAGCCTCTGGTTTCATCGGGTCATGGCTTGTCATGA 76 
ESVCVTGASGFIGSWLVM 




GAA AGT G - T TGC GTTACAG GAG CCTC TGG TTT CATC GGG TCATG G CTTG TCATG A 7 5 
ES VALQEPLVSSGHGLS* 


c 




Bay 


GGCTACACGGTCCGAGCCACTGTACGCGATCCAGCTAACATGAAGAAG 1 62 
GYTVRATVRDPANMKK 


E30-D-1 


SsrGI 

GGCTACACGGTCCGAGCCACTGTACACGATCCAGCTAACATGAAGAAG 162 
GYTVRATVHDPANMKK 


D 


^ fourth exon fourth intron ^ 


Clark 
Clark-h'4 


ACTGCTCTTTCACTAATCACAGGTGCCCTTTATACGTGGATTTTGTTGTCATTTTAA 

T A L S L I T 
ACTGCTCTTTCACTAATCACAGATGCCCTTTATACGTGGATTTTGTTGTCATTTTAA 

TALSLITDALYTWILLSF* 


Figure 6 Nucleotide and amino acid polymorphisms of the DFR2 gene in flower color variants of soybean. (A) Amino acid sequence of 
Bay and 222-A-3. Amino acids polymorpliic in 222-A-3 are sliown in bold. NADPH binding domain is underlined. The polymorphic amino acid in 
E30-D-1 is shown in red font. End of the fourth exon is indicted by an arrow. (B) Alignment of partial cDNA and amino acid sequences from Bay 
and 222-A-3. Polymorphic nucleotides are shown in red font. The polymorphic amino acids in 222-A-3 are shown in bold. (C) Alignment of partial 
cDNA and amino acid sequences from Bay and E30-D-1. Polymorphic nucleotides and amino acids are shown in red font. Restriction site used for 
CAPS analysis is underlined. (D) Alignment of partial nucleotide and amino acid sequences around the end of the fourth exon of Clark and the 
corresponding region of Clark-i/\/4. Polymorphic nucleotides are shown in red font. Nucleotides common in the 5' splice site are underlined. 
Amino acids polymorphic in Clark-w4 are shown in bold. 



may not be the cause of flower color change, because 
Tgm9 resides in the intron, and footprints, if any, are 
not likely to substantially affect gene expression. Instead, 
re-insertion into the promoter region is more likely re- 
sponsible. The upstream promoter regions of structural 
anthocyanin biosynthesis genes contain cis regulatory ele- 
ments that affect pigmentation patterns or intensity [5]. It 
is interesting to determine the role of 9 -bp differences in 



the Tgm9 integration site in the expression of the DFR2 
gene. Detailed promoter assays may be necessary to iden- 
tify the cis element regulating the expression of this gene. 

DFR2 gene of soybean controls intensity and distribu- 
tion of pigmentation in flower petals. Mutation of the 
gene results in unique flavonoid composition and a wide 
variety of flower color patterns, from near white, light 
purple, dilute purple to pale. 
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undigested digested 
B C C4 22 E3 B C C4 22 E3 (t> 
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Figure 7 Results of CAPS analysis of DFR2 gene in soybean. (Upper panel) Results of CAPS analysis for flower color variants. PCR products 
amplified with CAPS primers were digested by forGI and the digests were separated on an 8% polyacrylamide gel. cj), molecular marker (t)xl74/ 
Hae\\\; B, Bay; C, Clark; C4, Clark-w4; 22, 222-A3; E3, E30-D-1. (Lower panel) Results of CAPS analysis in an F2 population derived from a cross 
between E30-D-1 and C\ark-w4. cj), (\)x]74/Hae\\\: C4, Clark-w4; E3, E30-D-1; H, F2 plants segregating for flower color; L, F2 plants fixed for light 
purple flower; N, F2 plants fixed for near white flower. The migration of size markers is shown to the left of the gel. 



Conclusions 

The flower colors of 222-A-3, Clark-iv4, E30-D-1, 
kw4, T321 and T369 were controlled by the W4 gene 
encoding DFR2, In 222-A-3, a single-base deletion 
probably produced a truncated polypeptide consisting 
of 24 amino acids. In Clark-iv4, base substitution of 
the first nucleotide of the fourth intron abolished the 
5' splice site, resulting in the retention of the intron. 
The DFR2 gene of kw4 was not expressed. The above 
results suggest that complete loss-of-function of DFR2 
gene leads to near white flowers. Flower color of E30- 



D-1 was controlled by a new allele of the W4 locus, 
w4'lp. In E30-D-1, a single-base substitution changed 
an amino acid at position 39 from arginine to histi- 
dine. In T369, expression of DFR2 gene was 2.3 times 
that of purple flowers, and the flower petals contained 
unique dihydroflavonols which are absent in other 
G. max and G. soja accessions. Thus, mutations of 
DFR2 gene results in unique flavonoid compositions 
and a wide variety of flower color patterns in soybean, 
ranging from near white, light purple, dilute purple 
to pale. 




Bay 222-A-3 E30-D-1 Clk-W T321 T369 kw4 

Figure 8 Expression of DFR2 gene relative to the cultivar Bay in flower petals of soybean and Glycine soja. Transcript levels were 
standardized to the transcript level of actin. The means and SDs from three biological replications are exhibited. 

V J 
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Additional file 1: Figure SI. Alignment of the 5' upstream region of 
DFR2 gene in soybean cultivar Clark and a Glycine soja accession kw4. 
Polymorphic nucleotides are shown in red font. Coding sequence is 
underlined. 
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